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Abstract. We describe a new computational analysis technique to identify 
lattices and structural defects in large-scale atomistic computer simulations of 
crystalline materials. Our approach is based on a user-supplied catalog of template 
structures. We use a graph-based pattern matching algorithm to find occurrences 
of periodic and non-periodic atomic arrangements in atomistic snapshots, and 
to generate a high-level description of a simulated microstructure. The method 
covers defects such as stacking faults, grain boundaries, crystal interfaces, point 
defects, and defect clusters in a wide range of crystal lattices. In contrast to 
existing methods, the proposed pattern matching algorithm is able to identify 
crystal structures with a polyatomic basis. Furthermore, we discuss how the local 
lattice orientation can be determined to measure crystal rotations, and how a 
defective crystal can be mapped to an ideal reference state. Finally, we derive a 
computational method for detecting and characterizing disclination defects in the 
analysis data. 
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1. Introduction 

Atomistic simtilation methods such as molecular dynamics (MD), molecular statics, 
and Monte Carlo schemes are routinely used to study crystalline materials at the 
atomic scale. As crystal defects play a critical role for the understanding of most 
materials properties, they have been the subject of a large number of simulation 
studies. In this paper we present a set of new computational analysis techniques 
that allow one to extract and characterize structural defects in large-scale atomistic 
computer simulations of crystalline materials in a fully automated fashion. The 
described methods cover defects such as stacking faults, grain boundaries, coherent 
crystal interfaces, point defects, and defect clusters in a wide range of crystal lattices. 

In recent years, the availability of improved simulation methods, better 
interatomic potentials, and increased computing power has led to the expectation that 
atomic-scale computer simulations should yield exact quantitative output in addition 
to qualitative insights into physical processes. Obtaining quantitative data from, for 
instance, a molecular dynamics simulation usually requires processing the raw atomic 
trajectories in a specific way, thereby extracting information on the number, evolution, 
and interaction of important features such as crystal defects. Such data can finally be 
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used to interpret experimental measurements or, perhaps more importantly, provide 
input to coarse-grained models of materials behavior. 

Existing analysis techniques such as the common neighbor analysis (CNA) [ij and 
the centrosymmetry parameter (CSP) [2] arc conventionally used in simulation studies 
of crystal plasticity to filter out atoms which form a perfect lattice, thereby revealing 
crystal defects such as dislocations and grain boundaries. Even though this is usually 
sufficient for the scientist to visualize and visually interpret the important physical 
processes Q, it does not yield an abstract description of the system: The purely 
atomistic representation of the crystal is maintained by these analysis methods, and 
each atom is processed and classified independently. In particular, no higher-level 
(that is more abstract) description of the relevant crystal features (i.e. grains and 
lattice defects) constituted by the atoms is obtained through these simple techniques. 
Therefore, novel data-reduction approaches need to be developed, which are able 
to identify and characterize extended crystal defects efficiently and accurately, and 
transform them into discrete objects, which are accessible to quantitative analyses. 
This is needed to catch up with the increasing complexity of large-scale MD 
simulations, which model the interaction of a large variety and a large number of 
such crystal defects. 

In this paper we describe new computational techniques for the fully-automated 
identification of grains with arbitrary lattice structures, coherent crystal interfaces 
including grain boundaries, planar stacking faults, and zero-dimensional crystal defects 
in atomistic simulation data. They are complemented by an already available analysis 
method for dislocations 0, Si- In addition to identifying and classifying defects in 
the atomistic input data, the goals of the techniques developed in this paper are to 
determine the lattice orientation and shape of individual grains in a polycrystal, to 
determine the orientation of defects with respect to the parent lattice, and to measure 
elastic strains in the material. Another application described in this paper is the 
detection of disclinations in the crystal. 

The data extracted and generated from an atomistic snapshot should be as 
comprehensive as possible to support unforeseeable types of queries conceived by the 
users. The general aim of the present effort is to reach a level of representation that is 
more abstract than the overloaded, fully atomistic description of the microstructure, 
but which retains all its important features such that, in principle, the atomistic 
system could be reconstructed from it (lossless data reduction). 

A second design goal for an analysis code is universality. That is, the set of 
recognized lattices and defects is not hard-coded into the algorithm, but is extensible 
by the user, who may add new lattice types, interfaces and other defects to the 
search catalog simply by providing a template of the structure. From this template 
the algorithm automatically generates a characteristic pattern that is used to find 
occurrences of the structure in the actual simulation data. 

We give an overview of the most commonly used methods for extracting structural 
features from atomistic simulation snapshots of crystalline materials in the following 
sections. We then discuss their shortcomings and limitations in section II. 2[ which 
motivated the development of the new techniques presented in sections [2] and [3l 



1.1. Existing analysis techniques 

Here, we focus on analysis techniques for crystalline materials only. A broader overview 
of structural characterization methods for general particle systems can be found in Q . 
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1.1.1. Energy filtering The potential energy of an atom can be used as a simple 
indicator to decide whether it forms a perfect lattice with its neighbors. Given that 
crystal defects are usually higher in energy than the perfect lattice, one can detect 
the former by applying a threshold criterion to the atomic energies: Atoms having 
a potential energy above the threshold value are considered defect atoms, while low- 
energy atoms are classified as regular crystalline atoms. 

Several shortcomings of this simple method have contributed to the fact that it is 
rarely used nowadays. Since both the perfect lattice state and the defect state of the 
crystal are usually local energy minima, the perfect lattice can remain stable at atomic 
energies close to or even above the defect energy. In particular, the discrimination 
becomes unreliable if the energy ranges of lattice and defect atoms overlap due to the 
effects of elastic strain energy or thermal energy. 

Moreover, the potential energy of individual atoms is specific to the employed 
interaction model, and, for some interatomic potentials and quantum mechanical 
descriptions, is not computable at all. This is why one usually prefers structural 
analysis methods, which compare the spatial arrangement of atoms to a reference 
configuration. 

1.1.2. Centra symmetry parameter The centrosymmetry property of some lattices 
such as face-centered cubic (fee) and body-centered cubic (bcc) can be used to 
distinguish them from other structures such as crystal defects where the symmetry 
is broken. Kelchner et al. 0| have developed a metric, the so-called centrosymmetry 
parameter (CSP), that quantifies the local loss of centrosymmetry at an atomic site, 
which is characteristic for most crystal defects. 

The CSP of an atom having N nearest neighbors is defined as 

N/2 

raP = ^|r,+r,+^./2|' (1) 
1=1 

where and rj_|_jv/2 ^-re vectors from the central atom to a pair of opposite 
neighbors. Practical ways of finding these pairs are described in the accompanying 
documentation of the visualization program AtomEye [7] and the molecular dynamics 
code LAMMPS ^. 

The main advantage of the CSP is that it is affected only marginally by elastic 
distortions of the crystal. In particular, any affine deformation of the lattice does not 
change its degree of centrosymmetry at all. The CSP is, however, sensitive to random 
thermal displacements of atoms. Being only a one-dimensional measure, the CSP's 
ability to differentiate between different defect structures is rather weak. The noise 
induced by thermal displacements and inhomogeneous elastic strain may well dominate 
any characteristic differences between structures. Naturally, only centrosymmetric 
crystal lattices can be properly treated with this method, and it provides no means of 
differentiating between several fully centrosymmetric crystal phases. 

1.1.3. Common neighbor analysis Analysis methods that employ more complex, 
multi-dimensional signatures to characterize arrangements of atoms are usually better 
in discriminating between several structures. A popular method of this kind is the 
common neighbor analysis (CNA) (l| . Particular to the CNA is that the characteristic 
signature is not a direct function of the atomic coordinates. Instead the CNA is based 
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on an abstract representation of the crystal's topology, the network of bonds between 
atoms. 

Usually, two atoms are said to be near-neighbors (i.e. bonded) if they are within 
a specified cutoff distance of each other. For densely packed structures the cutoff 
distance lies halfway between the first and second neighbor shell of the lattice under 
consideration, while for the bcc lattice (and other more open lattices) several shells 
need to be taken into account. 

To assign a local crystal structure to an atom, three characteristic numbers are 
computed for each of the N neighbor bonds of the central atom: The number of 
common neighbors, the total number of bonds between these common neighbors, and 
the number of bonds in the longest chain of bonds between the common neighbors. 
This gives N triplets (ijk), which are compared to the reference patterns of typical 
lattice structures (table [ij: 



fee {N = 12) 


hep {N = 12) 


bcc {N = 14) 


diamond {N — 16) 


12 X (421) 


6 X (421) 
6 X (422) 


8 X (666) 
6 X (444) 


12 X (543) 
4 X (663) 



Table 1. CNA signatures for common crystal structures. 



The common neighborhood parameter [9| should be mentioned as an alternative 
analysis method, which was proposed by Tsuzuki et al. to combine the strengths 
of both the CNA and CSP methods. The CNA has also been extended to binary 
atomic systems by taking the chemical species of common neighbors into account as 
an additional criterion [lO| . This extension enables the identification of simple binary 
structures such as LIq, LI2 etc. 



1.1.4- Bond-angle distribution analysis The bond-angle distribution analysis has 
been developed by Ackland and Jones [llj to distinguish fee, hep and bcc coordination 
structures in an atomistic simulation. To this end, the N{N — l)/2 bond angle cosines 
cos Oijk of an atom are used to build a histogram, which is then further evaluated 
using a set of heuristic decision rules to determine the most likely structure type. 
These criteria have been optimized such to archive a robust identification of the most 
important crystal structures. 



1.1.5. Voronoi analysis The Voronoi decomposition has been employed in 
simulations of liquids and glasses to study various properties of their atomic structure 



12l . [13j . To characterize the local structural topology around a single particle, the 



corresponding Voronoi polyhedron is translated into a compact signature by counting 
the number of polygonal facets having three, four, five and six vertices. This yields a 
vector of four integers, (7^3, 714, 715, ng), that identifies the structural type. For instance 
the Voronoi polyhedron of an fee lattice atom comprises 12 facets having four vertices 
each. Thus the corresponding signature is (0,12,0,0). The polyhedron of a bcc atom 
has facets with four and six vertices. The corresponding signature is (0,6,0,8). 

Even though the Voronoi method has been used numerous times for the analysis 
of unstructured particle systems such as liquids and glasses, it has rarely been applied 
to simulations of crystalline materials. The highly symmetric crystalline packings in 
lattices such as fee and hep can cause singularities in the Voronoi construction, and 
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only minor perturbations in the atomic coordinates dramatically change the Voronoi 
polyhedra 3, This sensitivity and the high computational cost of the Voronoi 
polyhedron construction render the application of this method to crystalline systems 
less attractive. 



1.2. Limitations of existing methods 

The structure identification methods described above are restricted with respect to 
the types of crystal structures that can be identified: 

(i) The CSP method requires the local environment to be symmetric, and for the 
CNA neighbors need to be arranged on discrete shells around the central atom. 
Note that neither is generally the case for atoms in crystal defects. 

(ii) The bond-angle distribution method makes use of hard-coded rules to differentiate 
between lattice structures. These optimized rules were obtained heuristically 
for each supported lattice structure to yield optimal results. From a user's 
perspective, however, this manual approach severely limits the range of structures 
that can be treated with this method. 

(iii) The Voronoi method provides no means of consistently controlling the sensitivity 
of the structure identification, as even slightly distorted atomic positions tend to 
change the resulting polyhedra dramatically. 

In addition, the existing crystal structure identification methods have two important 
limitations in common, which will be addressed by the present work: 

(i) The described methods are limited to simple lattice structures with a monatomic 
basis. Even though they can characterize the local coordination structure at 
each lattice site separately, they fail to identify lattices with multiple atoms per 
primitive unit cell and crystal defect structures consisting of several atoms. 

(ii) The described methods determine only whether an arrangement of atoms in 
a simulation snapshot matches a certain reference structure or not. To this 
end, the local arrangement of neighbors is transformed into a characteristic 
signature and compared to a known reference value, yielding either a match 
or a mismatch. While this is sufficient for simple visualization purposes, 
it is insufficient for comprehensive analyses as we pursue them. For many 
applications it is instrumental to associate the individual neighbor atoms with 
the corresponding bonds in the reference structure, i.e. to create a one-to-one 
mapping between the actual crystal and the ideal reference lattice. This extended 
information will enable us, for instance, to track the local crystal orientation, to 
detect dislocations and disclinations, and to calculate the elastic strain field. 

2. Coordination pattern analysis 

2.1. Overview 

We address the aforementioned issues by proposing three computer algorithms, which, 
when put together, will provide a comprehensive analysis tool for simulation snapshots 
of crystalline materials. 

We first describe the so-called neighbor distance analysis (NDA) , which has been 
devised as an alternative to the conventional structure identification methods discussed 
in the review. While being computationally more expensive, it does not require the 
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arrangement of neighbor atoms to be symmetric or ordered in a specific way. The NDA 
provides a direct way of controlling the sensitivity of the matching process to thermal 
displacements and elastic deformations. It is most useful in situations where the 
coordination structure of atoms is lacking any particular symmetries or shell structure 
as it is usually the case in crystal defect cores. 

Like other methods discussed above, the NDA operates on individual atoms only. 
In a second processing step, we will apply a pattern recognition method to identify 
extended structures such as complex lattices and grain boundaries that comprise 
multiple atoms. The task of this graph-based pattern search technique, described 
in section 13. 3[ is to find occurrences of periodic and non-periodic template structures 
in the simulation data. As a result of this second processing step, the input crystal is 
divided into clusters. Each cluster is a connected set of atoms that form a particular 
structure from the reference catalog (e.g. a certain lattice, grain boundary, stacking 
fault etc.) with long-range order. 

In the last step, the orientational relationship between adjacent clusters is 
determined (section |4]) , and an abstract description of the entire microstructure is 
assembled. To give an example: If a grain boundary has been identified by the 
pattern recognition algorithm based on its characteristic atomic structure (for instance 
a twin boundary), we can infer the exact crystallographic orientation relationship 
between the two adjacent grains, irrespective of any elastic deformation that might be 
superimposed on the crystal. Ultimately, a high-level description of the polycrystal is 
obtained that no longer consists of individual atoms, but of discrete objects such as 
crystal grains, interfaces, and lattice defects. 

2.2. Formal definition of a coordination pattern 

A coordination pattern p — {Rp, Sp) describes the local arrangement of neighbor atoms 
around a central atom in an arbitrary Cartesian coordinate system. The pattern is 
specified in terms of a list of bond vectors Rp = (R-i, . . . , Rw) connecting the central 
atom with its N nearest neighbors (with N being a freely selectable parameter). The 
coordination pattern for fee lattice sites, for instance, consists of = 12 neighbors, 
with the neighbor vectors comprising the a/2 (110) vector family. Note that the bond 
list Rp has an arbitrary but fixed ordering. 

The permutation group Sp — {os\ describes the point symmetry of the 
coordination pattern, with {ds} being a set of permutations of the numbers {1, . . . , A^}, 
corresponding to permutations of the neighbors. For instance, the twelve neighbors of 
an fee atom can be permutated in 48 equivalent ways, corresponding to the elements 
of the m3m point symmetry group of cubic crystals. The set Sp can be precomputed 
on the basis of Rp. 

2.3. Coordination pattern matching 

Let Va = {oi} be the set of atoms in a simulation snapshot to be analyzed. Given an 
atom a G VJj and a coordination pattern p, the task of a coordination pattern matching 
algorithm is to determine whether the neighbor atoms of a form a structure that is 
sufficiently similar to the ideal arrangement p. The input arrangement to be tested 
against the pattern is given as a set of vectors (ri, . . . , r^r) connecting the central atom 
a with its A^ nearest neighbors (ai, . . . , cat) C Va. The nearest neighbors of an atom 
in a simulation snapshot can be found by means of a k-d tree data structure [l^ and 
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a recursive fc-th nearest neighbor query algorithm [l^ , or by other means such as the 
neighbor hsts in a molecular dynamics simulation. 

In section 11.11 we have described various structure matching methods that all 
exploit structural symmetries in some way. Instead of directly comparing (ri, . . . , rjv) 
to (Ri, . . . , Rjv), they condense both sets into characteristic signatures which are 
invariant under rotation, and which can easily be compared. This transformation is 
essentially what makes the identification process efBcient and robust (see [Sf for an in- 
depth discussion). Note that, at the same time, this data reduction usually results in 
some insensitivity to elastic deformation: Small perturbations of the atomic positions 
do not change the calculated signature. 

2.4- Neighbor distance analysis 

In general, the atomic arrangements found in the core regions of crystal defects may 
not exhibit any symmetry or other type of order such as discrete neighbor shells. It 
might therefore be difficult to reduce their description to a small set of characteristic 
numbers (and even less so to a scalar signature like the CSP). Thus, in such a case, 
one has to go back to a more extensive type of signature, as we will propose it in the 
following. 

For this, let us suppose the vectors of the coordination pattern, Rp = 
(Ri, . . . , Rjv), are ordered according to their distance from the central atom such that 
i?i < . . . < i?jv. Correspondingly, we can sort the (initially random) list of neighbor 
atoms too such that < . . . < rjv. Note that this is usually not sufficient for correctly 
associating the actual atoms with the reference vectors: The bond lengths may be 
perturbed by thermal displacements, and the ordering can be non-unique if neighbors 
are arranged on discrete shells. However, we may compute a local hydrostatic scaling 
factor. A, from the two sorted bond lists: 



This scaling factor relates the lattice constant of the reference structure (which may be 
arbitrary, and is often chosen to be unity) to that of the actual crystal, which depends 
on factors such hydrostatic stress, temperature, and composition. 

The mapping between the reference vectors (Ri, . . . , Rjv) and the neighbor atoms 
(oi, . . . , ajv), as we want to determine it, can be expressed in terms of a permutation 
a = {aj-^, . . . ,ajj^) of the original, randomly-ordered neighbor list. As discussed in 
section [2.21 multiple equivalent permutations may exist due to the symmetries of the 
coordination structure. Given any valid permutation map cr, we can obtain all other 
equivalent mappings by applying the precomputed symmetry permutations Sp to it. 

How is a valid mapping a determined? For this we define a new type of signature 
that is based on the linear distance |Ri — Rj| between two neighbors i and j, which 
is invariant under rotation. Hence, we give this approach the name neighbor distance 
analysis (NDA). We assume that the coordination pattern p describes the equilibrium 
positions of atoms. In the actual crystal, however, atoms may be displaced due to 
thermal vibrations or elastic distortions of the lattice. Let the maximum allowed 
deviation of an atom from its equilibrium position be given by a user-definable 
parameter (^max- Then the test structure (ri, . . . , rjv) matches the reference pattern if 




(2) 



i=l 
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at least one mapping a exists such that the condition 



R7 — R,7 



(3) 



is fulfilled for all N{N — l)/2 neighbor pairs. That is, all rescaled distances must 
lie in the corresponding intervals given by the reference structure. This condition is 



illustrated by figure l(a 




Input structure 



(a) 

Reference pattern 



Early rejection test 




jmm J max 



a: 



pair distance 



validity intervals 



(b) 



pair distance & 
validity intervals 

(c) 



Figure 1. (a) Schematic picture of a low-symmetry coordination structure around 
a central atom. Dashed circles indicate the maximum distance a neighbor may 
deviate from its equilibrium position. This yields six min-max constraints on 
the mutual distances between the four neighbors, (b) For a positive match, 
a permutation of the neighbors must be found such that the actual distances 
fall into the intervals of the reference pattern, (c) By simply sorting the actual 
distances and the reference intervals, a quick rejection test can be performed 
without knowledge of the actual mapping. 

To find a valid permutation map a that fulfills condition [3] (or to confirm its non- 



existence), up to A^! possible permutations of the neighbors must tested (figure f (b) ) 



To avoid an exhaustive search, the search space can, however, be reduced considerably 
by pruning the combinatorial search tree and employing a backtracking algorithm 
[18|. As an additional optimization step prior to the full combinatorial search, we 
perform an early rejection test on the entire coordination structure by sorting both 
the list of mutual distances {dij} and the list of intervals | [d™™, dfj'^^] } in ascending 
order (figure 1(c)). If any of the distances falls outside the corresponding reference 
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interval, no valid permutation map can exist and the test structure does not match 
the coordination pattern. 

The user needs to specify two control parameters for each NDA coordination 
pattern: The number of nearest neighbors to be taken into account (N) and the 
maximum admissible displacement (^max)- N must be at least three, should include 
complete shells, and, apart from that, be as small as possible for best efficiency. We 
also require that a coordination pattern includes enough neighbors such that each of 
them is itself a neighbor of at least two others in the same coordination pattern. This 
requirement will be needed to facilitate the consistent alignment of lattice orientations 
at each crystal site (see section [3]). For the bcc lattice, for instance, this means that 
N must be 14; the second shell of neighbors must be included such that any two 
neighboring bcc atoms possess some common neighbors. 

The maximum admissible displacement parameter dmax determines the tolerance 
of the identification process. In general one would want to use a large Jmax to make 
the recognition of structures rebust at high temperatures or in the presence of strong 
elastic distortions. On the other hand, a too large Jmax parameter may lead to false 
positives when testing against multiple, only slightly different coordination patterns. 
That is, the local structure may match to multiple patterns if the agreement is within 
the specified tolerance for each of them. The resulting ambiguity will, in most cases, 
be resolved by the subsequent multi-atom analysis step (section [3]), where we take the 
non-local atomic structure beyond the nearest neighbors into account as an additional 
criterion. 

Note that we proposed the NDA only for identifying defective coordination 
structures that cannot be handled well with existing methods. In simple cases (such 
as perfect fee, hep, or bcc lattices), however, the conventional techniques such as the 
CNA might be the more economic choice. Thus, in practice, we use a combination of 
both the NDA and CNA to characterize local atomic structures. The CNA matching 
algorithm can be extended in a similar fashion such that it also yields the permutation 
map (J. 



3. Multi-atom pattern analysis 

Except for the simplest cases, most lattices consist of several atoms per unit cell, each 
having a different coordination structure. A local coordination analysis of individual 
atoms alone is not sufficient to unambiguously identify such extended structures. Let 
us take the 9R lattice fl^ as an illustrative example. This close-packed structure 
(stacking ...ABCBCACAB...) is a repeating sequence of three atomic layers. One 
atomic layer has an fcc-like local coordination, while the other two contain atoms with 



an hcp-like arrangement of neighbors (see figure 8(a) ). It is clear that the coordination 
structures of multiple atoms need to be taken into account simultaneously to identify 
such a lattice structure. 

This is the task of the second algorithm described in the following sections. It 
searches for occurrences of a multi-atom pattern (derived from the lattice's primitive 
unit cell) in the input data. The search algorithm divides the crystal into contiguous 
domains that exhibit perfect long-range order. Note that many planar crystal defects 
such as stacking faults, surfaces, coherent crystal interfaces and grain boundaries 
have a structure that can be described in terms of a pattern that is repeated in 
two dimensions. Thus the same pattern matching algorithm can be used to find 
occurrences of such planar defects in a microstructure. Furthermore, small point-like 
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defects such as vacancy or interstitial clusters can also be identified and counted with 
the same universal algorithm. 

3.1. Definition of a multi-atom pattern 

A multi-atom pattern is a graph structure combining several coordination patterns 
into a single unit. Our representation scheme is based on so-called periodic graphs 
or crystal nets, which are used for topological descriptions of crystal structures in 
crystal chemistry |2Q|]. Following the terminology given in [2H, a multi-atom pattern 
is a node- and edge-labeled directed periodic connected graph Gp — {Vp, Ep,I],p). 
Here, Vp is a set of nodes (atoms); Ep is a set of directed edges (bonds); E is a set 
of node labels (the catalog of coordination patterns), and p is a mapping function 
which assigns each node, G a coordination pattern p{vi) £ E. A directed edge 
(wi,t;j,L) e Ep leads from node Vi to node Vj, and is labeled with a vector L G M'^ 
that connects the two corresponding atoms in periodic Euclidean space. 

A path is an alternating sequence of n nodes {vi, V2, ■ . . ,Vn) and n — 1 consecutive 
edges ((wi,W2,Li2), (^2, W3, L23) • ■ • (I'n-i, Ln-i.n))- A circuit is a closed path in 
which the first and the last node are the same. We use AL = ^27=1 ''^i.i+ii to denote 
the sum of edge vectors in a path. The sum of edge vectors of a closed circuit is 
always an integer multiple of the repeat vectors hi,h2,h3 G M.^ of the unit cell, i.e. 
AL = nihi -|- n2h2 -f n^h^ with ni, 712, G Z. 

The outdegree deg{vi) of a node Vi denotes the number of edges to neighboring 
nodes. According to section \2.2\ the coordination pattern p(vi) G S, assigned to a 
node Vi, is essentially an ordered list (Ri, . . . , Rat) of = deg(ui) vectors pointing 
from a center point to the neighbors. There is a one-to-one correspondence between 
the vectors of the coordination pattern and the incident edges of the node it is assigned 
to. However, by labeling each incident edge with an independent vector L, which does 
not have to coincide with the corresponding vector R of the coordination pattern, 
we add an extra level of indirection. This allows us to decouple the definition of 
coordination patterns from their usage in the multi-atom pattern. In particular, we can 
exploit rotational symmetries of structures and assign the same coordination pattern 
to multiple nodes which have a similar but rotated coordination structure. 

A multi-atom pattern is either 3-periodic, 2-periodic or non-periodic, depending 
on whether it describes a lattice structure, a planar defect, or a point-like defect. 
For simple lattices such as fee, the periodic graph consists of only one node that is 
connected to itself via a number of edges. The assigned edge vectors L implicitly 
specify the geometry of the repeat unit (i.e. hi, h2, ha). 

In the case of ordered multi-component systems, each node is also labeled with 
a chemical species. This chemical information will serve as an additional criterion in 
the matching algorithm for cases where the structural information alone, given by the 
coordination patterns, is ambiguous (as in the LI2 structure for instance). 

The generation of a multi-atom pattern can be fully automated, and is based 
on the atomic coordinates of an ideal template structure provided by the user. The 
catalog of coordination patterns, E, is created by successively inspecting each atomic 
site in the template structure. If the local coordination structure does not match 
to one of the existing coordination patterns then a new pattern is created for that 
node and added to the catalog. Figure [5] depicts template cells for various structures 
from which multi-atom patterns and coordination patterns are derived. Here, only 
three distinct coordination patterns (indicated by the colors red, green, and blue) are 
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necessary to characterize the nodes in the muhi-atom patterns. 



hep lattice 



Multi-atom 

pattern 
templates 



fee lattiee 



L1 2 lattice '9'o 



Coordination 
patterns 




fee {111} surface 



coherent twin 
boundary (in fee) 




hep 



fee 



fee {111} surface 



(b) 



Figure 2. (a) Several template structures from which a catalog of multi-atom 
patterns and coordination patterns is derived. The first four structures (hep, fee, 
LI2, and 9R) are 3-periodic lattices, while the other two structures (fee surface and 
fee twin boundary) are 2-periodic planar defects, (b) Three different coordination 
patterns occur in the six multi-atom patterns. The assignment of coordination 
patterns to the multi-atom pattern nodes is indicated by the colors red, green and 
blue. 



3. 2. Generation of the input graph 

The pattern matching method that we use to find occurrences of a catalog of multi- 
atom patterns 11 = {Gp} in a atomistic simulation snapshot can be divided into two 
phases: First, the atomistic simulation snapshot is transformed into an input graph 
^^in by testing each atom against the patterns in the coordination pattern catalog 
E. In the second phase, we search for clusters of atoms (i.e. subgraphs of i?in) that 
topologically agree with one of the periodic pattern graphs {Gp}. 

Let Va be the set of atoms in the input snapshot, which constitute the nodes of 
the input graph Hin- We test each atom Uc (z Va against each coordination pattern 
p G T, using an appropriate coordination pattern matching algorithm (CNA or NDA). 
A positive match yields a list ap = {ai, . . . , a^) C Va that maps the N bonds of p to 
an equal number of neighbors of Oc- The determination of the permutation map ap 
was discussed in section [2^ We label Oc with the set of all positive matches, {ap}. 
Note that an atom may not match to any coordination pattern at all, then this set 
will be empty. Or it may match to multiple patterns, all being rather similar. Then 
{cTp} will contain multiple entries. Each match record ap implicitly defines a set of 
directed edges from atom Uc to neighboring atoms in Hm. 

Given a match record ap, we can apply the coordination pattern's point symmetry 
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group Sp to it to generate other equivalent matches. We assume that an atom's match 
hst, {cTp}, (imphcitly) contains these equivalent matches too. 

3.3. Pattern search algorithm 

To search for occurrences of a pattern graph Gp in the input graph Hm we use 
a variation of UUmann's graph matching algorithm [22]. This algorithm starts by 
mapping a first node of Gp to a matching seed atom in Hi^. Then neighbor nodes 
are successively mapped to matching neighbors of the seed atom to expand the 
partial match. The algorithm recursively continues with the neighbors' neighbors 
until eventually the entire pattern graph Gp has been mapped to a subgraph of Hi^. 
Unfruitful search paths are pruned, and if a pattern node cannot be associated with any 
available atom in iJi^ due to a violation of the isomorphism conditions, the algorithm 
backtracks and undoes the last assignment to continue with another branch of the 
state search tree. 

For our application, UUmann's original algorithm has been modified in two 
aspects: The first modification pertains to the implicit representation of edges in 
the input graph Hi^. In the previous section we pointed out that an atom can have 
several potential coordination structures (i.e. sets of edges to other atoms) given by 
the atom's match list, {cp}. Multiple match records result from local point symmetries 
and the ambiguity of coordination patterns. Now we have to decide for one of these 
coordination structures. To this end we extend the search space of UUmann's graph 
matching algorithm to include each atom's list of potential coordination structures. 

A subgraph isomorphism found by the graph matching algorithm is then fully 
specified in terms of a set of tuples, C — {(vi,ai, Cp)}, that associate atoms of the input 
snapshot with nodes of the pattern graph. Here, Vi G Vp is a node of Gp, S 14 is a 
corresponding atom in and cTp is one of the atom's match records, i.e. a map from 
the coordination pattern p to a set of neighbors of (cf. section 13. 2p . A necessary 
isomorphism condition is that cTp is a match record for the node's coordination pattern 
p{vi), and the atom's chemical species matches that of the pattern node. 

The map cTp can then be inverted such that, given a neighbor atom aj £ Cp of a^, 
we obtain its index in the coordination pattern p{vi). Together with the node Vi, this 
yields the corresponding edge (vi^Vj^'L) e Ep in the multi-atom pattern. Thus, we 
have defined a lookup function Lc(ai,aj) that, given two neighboring atoms, yields 
a corresponding ideal bond vector in the unit cell. A second necessary isomorphism 
condition is Lc(ai,aj) = — Lc(aj,ai) and Lc(ai,aj) + Lc(aj,afc) +Lc(afe,ai) = for 
every pair (ai,aj) and triplet {ai, aj , a^) of neighbor atoms in the matched region of 
7?in. In other words, the lattice orientation picked at one site must match with those 
picked at neighboring sites. The graph matching algorithm uses these conditions to 
prune the combinatorial search tree. 

Note that, since Gp is a periodic graph, it may match to a subgraph of Hi^ that 
is larger than Gp itself. Thus, the subgraph isomorphism C is no longer bijective, and 
a single node Vi G Vp of Gp may be mapped to an arbitrary number of atoms G Va 
in as long as perfect long-range order is not interrupted. For instance, let Gp 
be a lattice unit cell, then an entire crystallite is returned by a single invocation of 
the graph matching algorithm. We call the periodic subgraph isomorphism complete 
or a cluster, if it maps every pattern node Vi S Vp to at least one atom. A cluster 
C is maximal if no more atoms at its perimeter can be added to it without violating 
the isomorphism conditions, i.e. without interrupting the long-range order. In the 
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following, V{C) C Va denotes the set of crystal atoms that make up a cluster C. 



3.4- The vicinity criterion 

In practice, the generation of maximal clusters can lead to counterintuitive results. 



Figure 3(a) depicts such a situation. Here, the periodic pattern Gp comprises four 
nodes labeled with the species A and B. The crystal being analyzed contains an anti- 
phase boundary (APB), which interrupts the long-range order. Since every other row 
matches to the periodic pattern, irrespective of the phase shift at the APB, the left 
cluster 'bleeds' into the right cluster. 



Without vicinity criterion: 




Gp |®©|0^ vB)®(B|A)(B)|® 

^gj^J^Oi, ®®®® 



Anti-phase 
boundary 



With vicinity criterion: 

o®o^ ®®®®®® 
©or <d©®®®®® 

•6)®®® ^^ ^^ 

©©©ti,®®(B)(A)CB)® 
©©©©W ^®®®® 

^iS^o^ ®®®® 




Anti-phase 
boundary 



(a) 



(b) 



Figure 3. Both pictures show the same crystal as it is being decomposed into 
clusters with perfect long-range order. The biatomic crystal lattice contains an 
anti-phase boundary (APB). Without the vicinity criterion discussed in the text 
(left), the first cluster extends beyond the APB, since single rows of A atoms 
exhibit perfect order. This artifact can be avoided by enforcing a vicinity criterion 
(right), which ensures that an atom becomes part of a growing cluster only if all 
other atoms of the unit cell appear in its vicinity (and on the correct sites). 



To mitigate such undesirable effects, one can add an additional isomorphism 
condition, which we call a vicinity criterion. An atom a may only become part of a 
cluster if all nodes of the pattern appear at least once in a neighborhood 71(a) of a. 
Note that we have some freedom in how we define this neighborhood. 

The distance d{ai,aj) between two atoms ai,aj G ViC) is the number of 
steps in the shortest path with initial node and terminal node aj that is 
completely contained in the cluster C. The neighborhood of size s is the set 
nf,{a) — {flj G V{C) : d{a,aj) < s}. In our implementation we use the following 
vicinity criterion: Let Np = \ Vp\ denote the number of nodes in the periodic pattern 
graph. For a candidate atom a to become part of the cluster we require that the set 
of pattern nodes found in the neighborhood rig (a) contains at least s unique pattern 
nodes for any size 1 < s < Np. That is, with the s-th recursive step away from 
the candidate atom, we must have encountered at least s unique pattern nodes in its 
vicinity (in addition to the central node). If this criterion is not fulfilled then the 
candidate atom is rejected from the cluster (see figure [3(b)] ) . 

Note that alternative kinds of vicinity conditions are possible. The best choice 
depends, in general, on the size and the geometry of the structural unit cell. 
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Figure 4. Illustration of the decomposition of a crystal into clusters. The 
letters A-E represent the local coordination structure or species of an atom. The 
clustering algorithm finds two periodic clusters (subgraphs) in the test structure 
that match exactly the pattern template. Two other regions remain (shaded 
atoms), which have an unidentified structure. Note that the D atom in the upper 
right corner is not part of a cluster even though it is located on a correct lattice 
site. It has been rejected because it violates the vicinity criterion by being more 
than two neighbor distances away from the next A or B atom. 



template 
pattern: 



©® 



3.5. Crystal defects 

The input graph Hi^ is scanned for one muhi-atom pattern from the catalog 11 — {Gp} 
at a time. Once a cluster has been found by the pattern search algorithm, all atoms 
belonging to that cluster are marked and will be ignored in subsequent searches. Thus, 
any atom can only become part of at most one cluster. Figure 2] schematically shows 
the outcome of the clustering algorithm for a pattern consisting of four nodes. 

So far we have focused on the identification of periodic crystal lattices. Let us 
now consider the application of the pattern matching algorithm to lattice defects. 
In contrast to crystal lattices, defects cannot exist on their own: They are always 
embedded in a specific lattice, or, in the case of interfaces, located in between two 
crystallite clusters. It makes sense to include the specification of the surrounding 
lattice type (or types) in the description of a crystal defect pattern to avoid 
ambiguities. Let us take intrinsic stacking faults (ISF) in the fee lattice as an 
example (see figure [5]) . Such defects are constituted by two atomic layers having 
the same coordination structure as atoms in the hep lattice (stacking sequence 
...ABC| AC|ABC...). Thus, one could regard ISFs as a thin hep-phase layer that 
is separated from the fee crystal by two coherent interfaces. Or conversely, one could 
consider a bulk hep crystal as a large stacking of ISF defects. Neither would be 
completely correct. We can resolve this ambiguity by requiring ISF defects to be 
always bordered by fee clusters on both sides and by giving defect clusters precedence 
over lattice clusters. 

To this end we split the multi-atom pattern catalog into two partial catalogs, Iliat 
and Ildof, that contain only lattice patterns and defect patterns respectively. Two 
separate search passes are performed, one for the set Iliat and one for Ildcf • Clusters 
found during any one pass may not overlap with each other, but defect clusters found 
during the second pass can overlap with lattice clusters found during the first pass. 
To prevent bulk hep atoms (in a genuine hep phase) from being labeled as ISF defects, 
we extend the multi-atom pattern for ISFs to include four nodes as shown in figure [5] 
the two hep-coordinated atoms forming the defect core, and two additional fee atoms 
on either side of the defect plane. These fee nodes at the perimeter of the defect core 
overlap with the surrounding lattice and function as a filter. Thereby is is ensured 
that the ISF pattern can be matched only to atomic arrangements embedded in a fee 
crystal and not in other crystal phases. 

In a defect multi-atom pattern, the lattice- like nodes at the perimeter of the defect 
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Figure 5. Cross-sectional view of a defective fee crystal that contains a stacking 
fault bounded by a partial dislocation. The crystal has been decomposed into two 
clusters (and the unidentified dislocation core region). The multi-atom pattern 
for the stacking fault defect comprises additional overlap nodes to ensure that 
occurrences of the defect are always embedded in an fee lattice. 



core are flagged as overlap nodes meaning that, even though they must be present in 
the input structure, they are not considered part of the defect itself. That is, the 
corresponding atoms will not be incorporated into the defect cluster, but stay part of 
the surrounding lattice clusters. 



4. Cluster analysis 

In section 13.31 we have introduced the lookup function L = Lc (a^ , Oj ) , which, given a 
pair of neighbor atoms in a cluster C, returns the ideal vector L G connecting the 
two corresponding nodes in the multi-atom pattern. L is a cluster-space vector; it is 
given in the reference frame of the structure template cell from which the multi-atom 
pattern was derived. Correspondingly, r = r(ai, aj) = rj — is the vector connecting 
the two atoms in the simulation's frame of reference. Accordingly, r is a spatial vector. 

Note that, in the spatial configuration, a cluster can be elastically distorted (and 
with it the interatomic vectors r(ai, aj)). In the cluster's reference frame, in contrast, 
atoms are positioned on ideal sites at all times. That is, the vectors Lc(ai,aj) are 
unaffected by thermal or elastic perturbations of the atomic positions. 



4.I. Cluster orientation 

All atoms of a cluster C form a contiguous periodic pattern with long-range order, 
that has an arbitrary orientation with respect to the simulation coordinate system. 
We can compute an affine transformation matrix Fc that connects the cluster's frame 
with the spatial frame in a least-square sense: 



Fc = V^^W (4) 
V = ^ Lc{ai,aj) (^Lc{ai,aj) 

[ai ,aj ) 

W = hc{ai,aj) r{ai,aj) 

[ai ,aj ) 

Here, the summations are over all neighbor pairs (a^, aj) in the cluster; (g) denotes the 
outer product of two vectors. Fc characterizes the average, macroscopic orientation of 
the crystallite cluster, including rigid-body rotations and elastic stretches. The lattice 
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Figure 6. Illustration of the different reference frames used in the description of 
cluster orientations and cluster transitions. 



orientation might vary locally though due to elastic distortions of the crystal. Thus, in 
general, the macroscopic orientation agrees only approximately with the microscopic 
orientation, i.e. FcLc(ai,aj) w r{ai,aj). 

4-2. Cluster transition matrices 

Let A and B be two clusters. The transformation matrix T^g, relating their cluster- 
space vectors (i.e. — T^gL^), is given approximately by 

Tab ~ F^^Fa- (5) 

However, if A and B are two directly adjacent clusters (for instance, a crystal cluster 
and an embedded stacking fault defect as shown in figure [SJ , we may compute Tab 
exactly by making use of the overlap nodes introduced in section 13.51 Let a^ be an 
atom at the perimeter of the crystal defect that has been mapped to both a node of 
the lattice cluster A and an overlap node of the defect cluster B. Thus, each neighbor 
vector r(ao, aj) of this atom is simultaneously mapped to both reference frames, i.e. to 
the ideal vectors L^(ao,aj) and Le(ao,aj). This enables us to determine the cluster 
transition matrix Tab exactly: 

Tab = V-^W (6) 
V = y^L^(ao,aj) (8)L^(ao,aj-) 

aj 

W ^ '^LA{ao,aj) ®Lts{ao,aj) 

aj 

Here, the summation is over all neighbors aj of the overlap atom Oq. Note that Tab 
does not depend on which overlap atom we pick for its calculation since cluster-space 
vectors are not affected by local elastic distortions. 

Figure [S] schematically depicts the transitions between the cluster frames and the 
spatial frame. 
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Figure 7. Left: schematic picture of a microstructure, which has been 
decomposed into clusters (letters A-G). Solid black regions do not belong to 
a cluster and have an unidentified structure. Asterisks (*) indicate possible 
locations of disclinations. Right: Abstract representation of the cluster network 
as a graph. Each cluster possesses a frame of reference that is tied to the local 
lattice orientation. An edge connecting two nodes represents a transformation 
matrix that relates the frame of reference of a cluster to that of an adjacent 
cluster. 



4-3. The cluster graph 

A cluster graph is an edge-labeled symmetric directed graph, Gc = (V,E), generated 
from the described cluster adjacency analysis. Here, V = {Ci} is the set of clusters 
found in a simulation snapshot; E is the set of directed edges (transitions) connecting 
two neighboring clusters in the graph. A directed edge (Ci,Cj,Ty) G E leads from 
cluster Ci to cluster Cj , and is labeled with a transition matrix that transforms cluster- 
space vectors from cluster Ci to cluster Cj . The calculation of Tij from the overlapping 
region of Ci and Cj has been discussed in the preceding section. For every edge 
{Ci,Cj,Tij) there exists a reverse edge (Cj , Ci, T~-^). Clusters which are spatially 
separated, or which are separated by an unidentified type of crystal interface are not 
connected by an edge in the cluster graph. 

Figure [7] schematically shows a cluster graph generated from a crystalline 
microstructure that was decomposed by the pattern matching algorithm. 

4.4- Super cluster sets 

The described graph representation of clusters and their transitions allows us to 
determine the transition matrices for pairs of non-adjacent clusters. Consider, for 
example, a twin boundary which separates two crystallites. The pattern matching 
algorithm decomposes such a bicrystal into two lattice clusters, Ci and C2, and a defect 
cluster, Ctb, for the core region of the twin boundary. The defect cluster overlaps with 
both lattice custers, allowing us to compute the two transition matrices Ttb,i and 
Ttb,2 according to equation [6l The transition matrix T12 = Ttb,2 {Ttb,i)~^ then 
describes the relative orientation of the two crystallites on either side of the grain 
boundary. Note that T12 is the ideal crystallographic misorientation associated with 
the perfect grain boundary, which is unaffected by any superimposed elastic strains. 
The macroscopic misorientation of the two crystallites including elastic deformations 
and the effect of secondary grain boundary dislocations is given by equation [5] 

The set S = {Ci , C2 , Ctb } forms a so-called super cluster, since we can obtain a 
transition matrix for any pair of clusters in the set. That is, a cluster-space vector L, 
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specified in the reference frame of one cluster, can be transformed to any other cluster 
in the set. Formally, a super cluster S C V is a connected component of the cluster 
graph defined in section 14.31 (see figure [7]) . The transition matrix between two 
clusters Ci,Cj G S can be determined by finding a path connecting the initial node Ci 
and the terminal node Cj in the cluster graph. The transition matrix is computed by 
concatenating the consecutive transition matrices in the path. 

It might be that the resulting matrix is path-dependent, i.e., it depends on how we 
traverse the cluster graph from Ci to Cj . Given two distinct paths from Ci to Cj , which 
yield the transition matrices T^j'' and t'^'' respectively, we can join them to form a 
closed circuit. We call such a circuit a Frank-Nabarro circuit |23|. which is associated 

((2)\ (1) 
T^j 1 T^^ . The circuit encloses a disclination if 

M ^ I. The rotation matix M measures the rotational mismatch of the disclination 
defect. 



5. Examples 

5.1. Structural analysis of precipitates 

We have applied the described multi-atomic pattern recognition algorithm to a 
simulation of a Fe-Cu multi-phase alloy. In this simulation study, a combination of 
Metropolis Monte Carlo sampling (variance-constrained semi-grandcanonical ensemble 
(i^ l) and conventional MD time integration has been used to determine the equilibrium 
structure of Cu-rich precipitates in the Fe-rich bcc matrix. Monte Carlo transmutation 
steps were used to find the equilibrium distribution of Cu atoms at a prescribed 
temperature, while alternating MD steps allow the positional degrees of freedom to 
relax simultaneously. This makes it possible for structural phase transformations to 
occur within the simulation. Starting off from a random distribution of Cu atoms in 
the Fe matrix, the Cu atoms precipitate to form a spherical particle. Under certain 
temperature and concentration conditions, the structure of the cluster changes from 
bcc to a multiply-twinned 9R structure (herringbone structure) [l^ as shown in figure 

The conventional common neighbor analysis is able to identify the local 
coordination structure of individual atoms in the matrix (bcc) and inside the 
precipitate (alternating fee and hep layers). But it fails to classify atoms in the 
transition region where the lattice constant changes gradually from bcc-Fe to bcc- 
Cu to 9R-Cu, since the CNA is restricted to a fixed neighbor cutoff radius. The 
newly introduced neighbor distance analysis does not suffer from this limitation, as it 
acquires the right number of neighbors adaptively for each atom. 

When fed with an adequate catalog of template structures, the multi-atom pattern 
analysis algorithm is able to identify the 9R phase as well as twin boundaries and 
stacking faults in the 9R lattice correctly (figure |8(b)| . Note that there are several 
different types of stacking faults in the 9R phase. 

The fully-automated pattern analysis can effectively determine almost every 
atom's role in the crystal, and classify the type of lattice or defect it is part of. Such 
information could be used, for instance, to quantify the density of various defect types 
as function of the precipitation conditions. 
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Figure 8. Cross-sectional view of a Cu-rich precipitate in bcc-Fe. Atom colors 
indicate local structure type as determined by (a) conventional common neighbor 
analysis and (b) multi-atom pattern analysis. 

5.2. Point defect identification 

The pattern matching method described in this paper can serve as a simple tool 
for counting and tracking point defects in complex atomistic simulations where 
classical analysis methods fail. Figure [9] displays a processed snapshot of a molecular 
dynamics study [2^ of hydrogen embrittlement of iron under extreme conditions. The 
nanocrystalline Fe sample contains high densities of grain boundaries, dislocations, 
vacancies, and hydrogen atoms. The material is uniaxially strained to study the 
interaction of gliding dislocations with vacancies, hydrogen interstitials, and vacancy- 
hydrogen clusters. 

Conventional identification methods for vacancies and interstitials [ii'l such as the 
Wigner-Seitz (or Voronoi cell) method cannot be used in the presence of dislocations 
and grain boundaries as they require the definition of a reference state of the crystal. 
Such a reference state would have to include all crystal defects except for the point 
defects, and is therefore hard to construct. 

Our analysis method, in contrast, is based on a catalog of locally confined 
reference structures, which are rather easy to prepare. In the example shown in 
figure [9l we have employed the pattern matching technique to count the numbers of 
vacancies, di-vacancies, and tri- vacancies in the interior of the grains. In addition, the 
dislocation extraction algorithm (DXA) [3| has been used to identify dislocation lines 
in the bcc lattice. 

6. Summary 

A framework of computational methods has been devised to process the raw 
atomistic data (obtained from large-scale molecular dynamics simulations of crystalline 
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Figure 9. Visualization of tlie analysis data generated from a molecular dynamics 
simulation of dislocation-point defect interaction in Fe-H |25|| . The multi-atom 
pattern analysis has been used to identify vacancies (green cubes), di- vacancies 
(blue spheres) and tri-vacancies (pink cones) in the Fe matrix. Hydrogen impurity 
atoms have been ignored during the analysis. The dislocation extraction algorithm 
(DXA) was used to extract the dislocation lines (red) and the geometric shapes 
of the grain boundaries (gray). The simulation cell volume is 51 x 20 x 22 nm, 
and the input data contained two million atoms. 

materials) with the aim to generate an abstract description of the microstructure. 
Starting at the level of individual atoms, their arrangement is analyzed to identify 
larger structural units such as grains, interfaces, and other crystal defects. 

The described processing sequence can be divided into three stages: First, the 
local arrangement of individual atoms is classified to provide the basis for the second 
analysis step, in which atoms are grouped into clusters based on the long-range pattern 
they form. In the third step, an abstract graph representation of the cluster network 
is generated, which includes information about structural phases, lattice orientations, 
and defect types. This data enables a wide spectrum of analyses at the microstructure 
level. 

As an additional result, the atomistic data is enriched in the sense that the role 
of individual atoms as parts of larger structures is identified. This allows us to map 
atoms and their bond vectors to an ideal reference configuration. In forthcoming 
papers we will build upon this approach to develop new analyses. One application 
is the extension of the dislocation extraction algorithm Q to partial dislocations and 
grain boundary dislocations. Since determining the Burgers vector of such dislocations 
requires Burgers circuits that pass through stacking faults or grain boundaries, we can 
make use of the described mapping of such defects to an ideal reference configuration. 
Another application of the described methods, which we present elsewhere, is the 
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decomposition of the atomic-level strain field into elastic and plastic parts. 
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